Data Challenge 5: Bringing It All Together

Rocio Jaime, Sophie Li, Ester Zhao

Recommendations for disaster recovery

St. Himark Timeline of Events

View timeline on Google Sheets

Suggested Resource Allocation Using Rumble App Data

The Rumble app begins to receive a small wave of damage reports starting on April 6,2020 at 16:00 briefly after the initial shock. A substantial surge of Rumble reports follows on April 8th at 9:00am, primarily in Old Town and the southern neighborhoods closer to the coast (Broadview, Chapparal,and Scenic Vista). Old Town sent the most reports with the highest occurring category being power. Because there were already efforts to modernize the electrical distribution system, it is possible that the outdated infrastructure was especially vulnerable to the earthquake. The Old Town area was also nearest to the epicenter of the earthquake’s aftershock. The southern neighborhoods had a greater number of reports pertaining to water and sewage.

We see a second surge of reports on the afternoon of April 9th, most likely due to a power outage which caused a delay of report receipts. It is possible that many of these reports were sent in between April 9th 3:00am to April 9th 12:00. During this window there were little to no reports sent from nearly all neighborhoods in St. Himark. Based on Rumble data alone, we’d recommend sending emergency services to Old Town(3), Pepper Mill (12), and the southernmost neighborhoods -- Broadview, Chaparral, and Scenic Vista -- due to the significantly higher volume of Rumble reports.

Y*Int Social Media Data and Possible Usage

Many citizens of St. Himark use the social media app Y*int to communicate with their community. We decided to look into whether Y*int could be used as a viable method for city officials to survey the wellbeing of the population, and potentially anticipate needs. In particular, we wanted to see if Y*int survelliance could be used as an alternative when the Rumble app experienced outages due to power loss.

This initial graph displays the Y*int messages by neighborhood, and show a large spike midday on April 8th, when the largest and most disruptive earthquake event occurred.

In order to make raw Y*int data useful, we created functions that clean the data to remove most bots and business accounts that are not relevant. The file containing these functions can be found at clean_yint_data.ipynb.

Using this function, officials can pass in new Y*int data files to receive updated graphs that display messages related to earthquakes or disasters. For example, the graph below displays the earthquake-related messages from the week of April 6th by neighborhood.

In order to evaluate the usefulness of Y*Int data as a potential alternative to the Rumble app, we mapped Rumble alerts and Y*int earthquake-related messages together. At midday on April 8th, we see a sharp decline in Rumble alerts, which we inferred was due to a mass power outage. Although Y*int data also experienced a serious of drops during that time frame, it was to a lesser extent than the Rumble app. As such, we conclude that while the Rumble app will be more useful for government officials to use on a regular basis for allocating resources, Y*int data created using our functions can be considered a feasible backup option.

Finally, we also used Y*int data to create a more comprehensive timeline of the events that occured over the past few days, for future reference. To do so, we pulled all messages sent by @AlwaysSafePowerCompany, @DOT-StHimark, and @DeptTransport.

Sensor Data Analysis

We examined the mobile and static sensor data in order to find potential radiation patterns. We plotted the mean radiation level per hour, and grouped each sensor by neighborhood. The Cheddarford, Palace Hills, and Downtown neighborhoods show moderately increasing levels of radiation starting on the evening of April 9th, while the Southwest neighborhood shows a more abrupt increase in counts per minute in the same time period.

One possible explanation for the extensive radiation labels is the location of the sensors. Many of these sensors are near hospitals -- the contamination could have spread by vehicles or other means to these hospitals. The map below shows hospitals (in white), static sensors (in red), and the nuclear plant (in yellow).

There is also a fair amount of uncertainty in the static sensor data. The shaded region shows the 95% confidence interval for the mean cpm per hour -- in other words, we are 95% confident that the true mean cpm lies within the shaded region. As the radiation level is higher, the width of the shaded region also increases compared to lower levels. In addition, Sensor 15 in Safe Town goes offline at 22:00 on April 8th until 22:00 on April 10th. Although Sensor 13 also operates in Safe Town, there is no way to compare the results for the two sensors in that time period for accuracy. Although the nuclear plant is located in Safe Town, the background cpm in that neighborhood seems lower than other neighborhoods. This seems counterintuitive, and we would like to investigate further into this sensor’s reliability.

Mobile sensor data analysis

We first examined the mobile sensor data by sensor id and owner, and found that the data was much noisier than the static sensors (even taken at one-minute intervals). This supplemental analysis is included in notebooks/mobile_analysis.ipynb.

Instead of examining the mobile sensor data by sensor, we decided to look at overall trends. The plot below (the right plot is a zoomed version of the full plot on the left) shows that the median radiation level measured by all sensors per day increases in the April 6th-April 10th period (hover over each boxplot to see summary statistics). In addition, the large range of the data may indicate that there are several outlier radiation readings.

We set the threshold for an outlier (unusually high) reading, by filtering for radiation readings that were greater than three standard deviations above the mean cpm value for that day. In order to incorporate the total number of readings in a specific area into account, we overlaid a 10 by 10 grid (grid points spaced approximately ~1.5 miles apart) onto the St. Himark map, and clustered each sensor to the nearest grid point. The plot below shows the mean radiation measurements for each grid cluster for each day, with the size of the grid cluster corresponding for the number of measurements assigned to the grid cluster. The exact mean and number of measurements can be found by hovering over each cluster.

We can see that there are a large number of high cpm measurements in Safe Town starting on April 8th, which suggests that the largest quake resulted in some damage at the nuclear power plant. Interestingly, April 9th and 10th show extremely high measurements in the Terrapin Springs/Scenic Vista area. This suggests that either the sensors in this area were malfunctioning, or that contamination from the power plant persists here after the earthquakes. In the future, we suggest that all mobile sensors be calibrated to static sensors in order to improve reliablity.

Next Steps and Conclusions

The current situation in St. Himark is messy, and the unreliability of our different data streams makes it difficult to create a perfect picture of what is happening. We have done our best to make clear logical jumps using the data we have access to, however we acknowledge that there is room for error in these conclusions.

Given more time, our team would have liked to look into creating neighborhood profiles in order to gauge reliability of different reports. This would allow city officials to quickly gauge which neighborhoods were probably experiences power outages, and which truly had the highest priority for resource allocation.

Furthermore, we would have liked to do additional analysis on the ways that the Y*int social media network could be synthesized with other data streams to create a dynamic analysis of city conditions. A more involved online awareness would also allow the city to communicate more openly with citizens, which could boost morale and calm hysteria in case of further emergencies.

Finally, we believe that there was a lot of untapped potential in the mobile sensor dataset that we did not have time to analyze. We could perform analysis on the movement of the sensors, to determine when the cars were stopped. In addition, we would have liked to look into the reliability of the sensors more in order to determine which data is trustworthy or not. Because high levels of radiation are so dangerous, we suggest that officials investigate the remaining sources of radiation -- even if they may be due to malfunctioning sensors.